fix test/ops/self_attention.py #5
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
我在实现 kvcache 后,发现 Prefill 阶段得到的 token 正确,Decode 阶段得到的 token 不对,通过查看张量,发现 self-attention 部分有问题,最终定位到 softmax 有问题,发现我实现的 self-attention 算子的 softmax 的部分不对(当 qlen != kvlen 时,也就是用 kvcache 时),但是通过了 test/ops/self-attention.py 的测试,在我的视线中增加 past_len = total_len - seqlen 之后,可以正确推理,但是通不过 self-attention 的测试了,以此推论 test/ops/self-attention.py 也有问题。
分析:
之前测试中的 mask 内容:
mask 应该具有的正确内容:

修改这一部分的逻辑之后,self-attention 和 infer 的 CI 测试都可以通过了。
下图是通过 CI 的截图:
